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Abstract 

The paper describes a continuous second-variation algorithm to solve optimal control problems where the control 
is defined on a closed set. A second order expansion of a Lagrangian provides linear updates of the control to construct 
a locally feedback optimal control of the problem. Since the process involves a backward and a forward stage, which 
require storing trajectories, a method has been devised to accurately store continuous solutions of ordinary differential 
equations. Thanks to the continuous approach, the method adapts implicitly the numerical time mesh. The novel 
method is demonstrated on bang-bang optimal control problems, showing the suitability of the method to identify 
automatically optimal switching points in the control. 

1 Introduction 

The current paper is concerned with the resolution of optimal control problems with bang-bang structure. Most 
optimal control problem solvers use a direct problem formulation[5| that transforms the optimal control problem 
into a nonlinear program! 1| [2||3|[4|. Indeed, one could argue that direct methods offer the most straightforward 
formulation to a generic optimal control solver, and they are quite robust. Alternative methods, for instance indirect 
methodsJQ, require formulating a boundary value problem and convergence is usually difficult. Neither direct, nor 
indirect method-based algorithms can be both robust and accurate in computing the optimal control solution. Besides, 
direct and indirect approach, one should explore second-order methods. 

Second-order methods appear as very good alternative. In the 1960's, while trying to elegantly solve LQP re- 
searchers devised second-order methods such as second-variation methods or differential dynamic programming 
(DDP)[7| |8| |9| [ 1 1 [ 1 1 1 [12|. The classical second order algorithms for optimal control problems use dynamic 
programming equations. Referred as differential dynamic programming (DDP), DDP provides a simple way of com- 
puting an optimal control through a quadratic expansion of the Hamilton-Jacobi-Bellman value function around a 
nominal trajectory [8 1. Over the past 50 years, these methods have also been tested and adapted on non-LQP problems, 
and in the space community, DDP based methods seem to get a new lease of life. Current computer architecture allow 
better programing of the methods thanks to significant increase in memory capacity and processing unit speed. So far, 
Whiffen lfT3llfl4l certainly proposed the best improvement of DDP method to provide a generic tool for high-fidelity 
low-thrust space trajectory optimisation. The major difficulty in low-thrust optimal control problems is the control is 
often of a bang-bang structure because the control is linear in the dynamics. Some work using DDP then consider 
simplifications in the dynamics| 15 1, or consider a quadratic cost to the designing of low-thrust trajectories to asteroids 
to improve convergence rate lfl6l . To the author knowledge there is not a lot of supporting theories for convergence 
and behaviour of DDP. 

Second-order methods can also be devised using Lagrangian expansions, rather than the Bellman's value func- 
tion. The approach has been introduced by Bullock lfTTl ifTSl and Bryson and extended by others for space trajectory 
optimization[19|[20|. Although the basis is different from the derivation of DDP the end results are quite similar. 
Both DDP and gradient-based Lagrangian methods compute a linear control update iteratively using gradient of the 
Hamiltonian function. However, the derivations based on the expansion of the Lagrangian rather than the Hamilton- 
Jacobi-Bellman value function provide a better theoretical framework. 

The algorithm presented in this paper uses as baseline an expansion of the Lagrangian. The novelty of the ap- 
proach is however many fold. This paper proposes an approach for approximating the continuous solution through 
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interpolation, thus improving the backward-forward integrations, and increasing the accuracy of the sensitivity matri- 
ces and the feedback coefficients. A method for handling dynamical control constraints without using penalization is 
presented. Penalization of the control or the state for constraints is one of the major drawback of current DDP and 
second-variation methods. That is, a regularisation of the Lagrangian is also presented to improve the convergence 
rate. 

In section [2] the main problem is introduced with notations and assumptions. The methodology is developed 
in section [3] This includes the second-order developments, the generation of an optimal feedback control law, the 
conditions of optimality, and the treatment of state constraints. Specific section|4]presents the optimal control update, 
and deals with the particular case of control constraints, which is important for the bang-bang problem class. Section|5] 
introduced a practical solution method to produce continuous solution for the problem, with an adaptive implicit mesh. 
The algorithm of the method is presented in section|6] Eventually, the last section, section|7]illustrates the method, and 
demonstrates that the method can find the bang-bang optimal control without requiring any insights on the switching 
structure. 



2 Problem Statement 
2.1 Preliminaries 

To simplify notation, the following conventions are used. Scalars are denoted in italics, while column vectors, matrices 
and tensors are in bold. In addition, column vectors use lowercase and matrices use uppercase. 
And for derivation, 

F - dF F 
x dx xy dxdy 

where F is a general scalar function. For clarity of the expressions, when the points of evaluation are omitted it is 
assumed that the expression is evaluated around a nominal trajectory [x(f),u(f),p]. The dimension of the state x(f) is 
n x , and the dimension of control u(f ) is n u . 



2.2 Formulation 

The developments include the case where a scalar constant parameter p is included and needs to be optimised along 
with the control. 

The dynamics are described by an ordinary differential equation 

dx 

— = t(x,u,p;t), te[ta,t f ], u(t)eU, P eP (1) 

= <p(x(r ),x ,p ) (2) 

where q> : W x xl-> Wf describes initial conditions on the state at date to and unknown parameter po. Functional f, 
of state function x e C([to,tf],W x ) and unknown constant parameter p €P C R, is assumed at least twice continuously 
differentiable, f G (R"* xUxP), with the Sobolev space 

W*(fl) = {/ G L 2 {Q.) : Ma G N, \a\ < k,d a f G L P (Q.)} 

Furthermore, the control is assumed a bounded measurable function, u G L 2 ([tQ, tf],U). Let the control region 
U C W" be defined by the box constraints: 

u\ < m <u" \<i< n u (3) 

where u\ G R and u" G R are respectively the lower and upper bounds on the i th component of the control. 
Let terminal constraints be defined as 

V{x{t f lp f )=Q (4) 
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where \p : W* xR-> M."v is assumed twice continuously differentiable. Scalar pf is an unknown parameter. With very 
few modifications, these problem parameters can also be slack variables of possible terminal inequality constraints. 
Consider the minimisation of an objective function J written under the Mayer form, 

J = J(x(tf),p f ) — > min (5) 

where J : R"- r x R — > R is a functional of the vector x(f^) and the parameter p. It is assumed well defined (Jacobian of 
maximum rank), and at least twice continuously differentiable. 

A second order method is presented to minimise the objective function J, Eq. (0, subjects to the state constraints 
Eqs. (O and (0), and the control constraint Eq. (01, with iterative updates on the control u(f ) and the parameters p,po 
and pf. 

3 Second Order Gradient Method 

3.1 Second Order Equations 

For this problem, an extended terminal value function is introduced, 

^{x,V,p f ;t f )=J(x(t f ),p f ) + V T ^(x(tf),p f ) + Y(^(tf),Pf) T CY(^(tf),Pf) (6) 

where V G M"v is a non-zero constant Lagrange multiplier vector for the constraints and C > is a square diagonal 
regularisation matrix, C G M"i'' n f(R). Let the augmented Lagrangian ££ G W 2 2 (K"* X U x P x R 2 x R"v) of the 
constrained optimal control problem be defined with 

J<f(x,u,p,po,p f ,v) = (t>{x,v,pf,t f ) + ri T q>{xo,po;to) + X T (f(x,u,p;t) -^J dt (7) 

It accounts for initial conditions Eq. ©, terminal constraints Eq. (01 and objective function Eq. ((5]). Costate vector X 
and Lagrange multiplier V have been introduced for the dynamics and the initial constraints Eq. (0 respectively. 

While most second-order methods published to date only use penalisation to reach constraint satisfaction, the 
presented method introduces Lagrange multipliers to have a quicker and better satisfaction of the constraints. As 
part of a min-max problem, V should be minimised for u and p, while maximised for V. Indeed, the constrained 
maximisation supposes the existence of a min-max and max-min problem, or also the existence of a saddle point. The 
regularisation matrix C of the augmented Lagrangian (Eq. (0) is thus used to regularise the Lagrangian and find a 
saddle point of L by reducing any duality gap. Gill l2Tl . Hestenesj22|, Powell[23| and others have demonstrated that 
solving the extended function of the unconstrained problem Eq. (0 is equivalent to solving the constrained problem 
Eqs.(g]-0. 

Let the Hamiltonian H G w£(W x x W lx x U x R) be defined as: 

H(x,X,u,p;t) = X{t) T {(x,u,p;t) (8) 

since the objective function J (Eq. |5j does not include Lagrangian terms (integral terms). 

Looking for control updates 8u, the unknowns of the problem are 8x, dp and dV, such that a control update can 
be expressed in the form 

Su = a + fiSx + ydp + codv (9) 
u = u + e„5u (10) 

with a G R"" , j3 G M*"'"* (R), y G W" and 0) G M"" >" v (R) . u is the current nominal control and e„ G [0 , 1 ] is an update 
factor. This control update can be seen as a feedback control law on the state x. 

Since /, f, q> and ^ are assumed at least twice continuously differentiable, the development of the Lagrangian 
Jz? G W 2 2 (R" r x U x R) to second order, around the nominal trajectory x and control u and nominal parameter values, 
yields, 
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d3*?(x,u,p,p ,pf,v)=d(l)+d(ri T (p) 



■X f Sxf + X 8xo 
8X T f8xf + 8Xq8xq 

■>f ( fdX + 8X\ T _ s . T dx\ . 

11 1 8x-8X — \dt+ I dHdt 



(11) 



with: 



(||5x|| 2 ,||5A|| 2 ,V) 



dH = H x 8x + H u 8u + H p dp + f8X 

+ ^8x T H xx 8x+^8u T H uu 8u+^H pp dp 2 



+ 5u r H ux 5x + 5x r H xu 5u + 8x T H xp dp+H px 8xdp 

+ 8u T H U pdp + HpuSudp 

+ SX T f x Sx + SX T f u Su + SX T f p dp 

+ Sx T f x SX + Su T f u SX + t p SXdp 

Residual term o(||5x|| 2 , ||5A|| 2 ,dp 2 ) describes the higher order terms of the development, and, 

o(\\8x\\ 2 ,\\8X\\\dp 2 ) = AJ? -dJtf(x,u,p, P o,p fl v) (13) 
A3? = 3?(x,vi,p,po,p-f,v)-3?(x,VL,p,po,pf,v) (14) 

AJ>f is a measure of the improvement owing to the updated control law (Eq. ([Toll). The second term of the right 
hand side must thus be of the same order of magnitude of A«5f . For linear or quadratic problems in x, u and p, the 
residual term is zero. 

Lemma 1 (Riccati Transformation). The costate vector is related to the other variables with the transformation, 

SX = RSx + KdV + Tdp (15) 
with R € R n *^(R), K € (K) and T E R l > n *(R). 

Proof. The state transition matrix <f>(tf,t) for the dynamics f yields, 

x(tf)=&n(tf,t)x(t)+Q 12 (tf,t)X(t)+Qi3(tf,t)p (16) 

X(t)=Q 2 l{t,tf)x(tf)+$22(t,tf)X(t f ) + &23(t,t f )p (17) 

P = *n(!,tf)x(tf) + <h2(t,tf)X(?f) + <h3(t,tf)p (18) 

The transversality conditions for the terminal constraints provide the value for X(tf). The transition matrix is 
invertible. Combining equations (TToT l and ( TP7I ). and considering small variations of the variables x, p and V yield: 

8X{t) = (1 -<l>2l(f,f / )^12(f/,f)) _1 *2l(f,f/)4 > ll(f/,0^(f) 

+ (l-4>2l(f,f/)4>l 2 (f/,f)) _1 (^ > 2l(f,f/)4>13(f/,f)+ < f23(f,f/))5/? 

+ (l-<D 2 i(f,f/)4>i2(f/,f)) _1 *22(r,r/)v^/v 

□ 

Assumption 1. Consider a nominal trajectory x and a nominal control u, possibly not optimal. The solutions of 
Eqs. A25\ \26\27\ are bounded with the following assumptions 124)1 H25VA2 6V : 
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• Legendre-Clebsch condition H uu > 0. This discards any problems where the control appears linearly. A proce- 
dure that guarantees this condition in the method shall be presented hereafter. 

• H xx — HxuHmjHux is positive-semidefinite. 

• R(f/) is positive-semidefinite 

The first assumption is a necessary condition to ensure that no singular arcs are encounter. However, this has no 
influence on the local optimality of the final solution. 

Definition 1. Considering a quadratic model (Q,R), the Trust-Region problem consists in minimising the quadratic 
expansion for the solution to be within a given ball B(x, r), 

nfin^f + p^R+ip^Qp 
s.t. 

N<A 

where p = 8x and A is called the Trust-Region radius. Furthermore, the solution p* is solution of linear problem 

(Q + Al)p = R 
for some A > such that Q + A I is positive definite. 

Applying the Trust-Region algorithm[27] |28||29] results then in the computation of a shift matrix S uu = Al such 
that the Hessian H uu H uu + Al is positive-definite. A counterpart of applying the Trust-Region method to correct 
the Hessian is a limitation of the control update amplitude 8u in accordance to the selection of Trust-Region radius A, 
which eventually has an influence on result of Eq. < TT~4b . 

Transformation of Eq. ( TT3T > allows to compute the feedback coefficients in Eq. ©. This yields, assuming H uu > 0, 

a = H uu 'H u (19) 

j3 = -H uu - 1 (H xu + R r f„) r (20) 

r=-H uu - 1 (H pu + T T f u ) r (21) 

fi) = -H uu - 1 (K r f u ) r (22) 



And, using the conditions of optimality, ordinary differential equations for R, K, T and the costate vector X are 
obtained. Thus, 



dk T 

dt 
dx 

dt 
dR 

~~dt 
dK 

dt 
dT 

dt 



: -H x + HuHuu- 1 (H ux +H uk R) (23) 

H A r = f (24) 

: H xx + R T f, + f r R + p T (H ux + f„R) (25) 

K^ + tO^^ + f^R) (26) 

: H px + T T f x + y T (H ux + f u R) + f p R (27) 



These ODEs allow of computing a bounded update 8u of the current nominal control u through Eq. ©. If the 
truncation error defined by Eq. (TPfl) is limited, this control update should minimise the Lagrangian of the problem and 
to some extent the terminal constraints violation and/or the terminal cost J. 

Additionally, to decrease the constraint violation, an update of the Lagrange multiplier V should be provided, and 
similarly, an update for the parameter p should be produced to minimize the Lagrangian. 
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3.2 Parameter and Lagrangian Vector Updates 

Proposition 1. The following transformations are considered 

dY=K T dx + Qdv + \dp (28) 
dt p = T T 8x + V T dv + Wdp (29) 

This can be demonstrated by computing the ODEs and identifying the similarities with previous ODEs. 
This yields additional ODEs: 

dV 

-— = K r (f„y+f p ) (30) 

-^=K r M> (31) 
dt 

dW 

dt 



= H pp + yH up + T T (i u Y+ f P ) + f P T (32) 



Eventually, assuming W(fo) < 0, the problem parameter update is, 



dp = -Vf(f )- 1 i Vp (33) 

Also, assuming Q(fo) -1 exists, lagrange multipliers are updated with the linear update 

dv = e v Q(t y l (50 - V(t )dp) (34) 

If Q(to) < 0, dv defines a direction for the constraints reduction, and e v € [0, 1] is thus determined to minimise the 
constraints ip. 

The ODE system (Eqs. (I30l-l32l) is integrated backwards from some terminal conditions depending on the terminal 
constraints \p and the terminal cost 7, while the state dynamics (Eq. (Q~|)) are integrated forward. This results in 
numerically decoupled schemes, which yield to numerical robustness. 

3.3 Terminal Conditions 

The formulation allows the handling of various types of constraints without changing of the backward integrated ODE, 
since the treatment of constraints has an influence only on the terminal conditions. 

The following terminal conditions are used to integrate backwards the ODEs (Eqs. ( 1251 ). ( 126*1 ). d27l) . d3"0i >. ( f3"Tl ). 
63): 

R(f/) =J«(f/) + ♦„('/) C 35 ) 

K(f/) = $ vx {t f ) (36) 

Q(?/)=0 (37) 

T(f/) = 4> xp (tf) (38) 

V( f/ ) = 4> vp (t f ) (39) 

W(?/) - 4> pp (t f ) (40) 

X(t f ) = t x (tf) (41) 

The satisfaction of the constraints depends on the Lagrange multiplier V. 

Lemma 2. Assuming H^ 1 is positive definite, the Jacobian y x (x(f/-)) of the constraints with respect to the terminal 
state Xf has full rank tiy, and Q(fo) must be negative definite, then the Lagrange multipliers update dV gives the 
(local) direction of minimisation of the constraints l/A. 

Proof Seed. □ 
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3.4 Parameters Determination 



The unknown parameters for the initial conditions and the dynamics do not follow the same update rule as of dp. 
Because the equations of the sensitivity matrices, (R, K, and Q) and (T, V, and W), are integrated backwards, setting 
the initial constraint <p does not change the terminal conditions of integration Eqs. d35l [36l [37l [381 [391 and [40b . 
Assuming the initial constraint <p is well defined (maximum rank condition), the update formula for the parameters p 
is 

dpo = -(p p l p (p p (42) 

There is no need to update the Lagrange vector n introduced in Eq. (fTTT) . One may want to include a relaxation 
factor to control the update 8po. 

Likewise, for the parameter part of the terminal constraint, 

dp f =-Ypp¥p (43) 



4 Treatment of Control Constraints 
4.1 Acceptance of Control Update 

Once the backward equations are integrated from their respective terminal conditions, and around a nominal trajectory, 
the control update is computed using Eqs. (O and ( TTOb . Note that according to Eq. d34l ). 8u depends on e v . 

Both e v and e„ permit of adjusting the control update to yield an improvement of the extended value function, and 
to ensure that the second order developments truncation error (Eq. (TBI ) is small enough. If Lagrange multipliers V are 
not used, a linesearch procedure is used to find e„ that minimizes the merit function. When e v is used, the direction of 
minimization is given by a linear combination of a and dV. One approach is to perform linesearches with e u and £ v 
successively, the purpose being to find quickly an improvement of the merit function, regardless whether it is or not 
an strict minimizer. 

Indeed, as in [9|, the validity of the second order development can be checked using 

AL = f f [H(x,k,U,p,t)-H(t,X,U,p,t)]dt (44) 

AL « (AJ*f) dv=0 (45) 

Clearly, 

lim AL = lim (A_Sf ) rfv _ n = (46) 

£„^0 £„^0 ' dV - U 

For £„ small enough, AL gets close to zero thus ensuring the second order developments are assumed valid. If in 
addition, the value AL of Eq. (l44l is negative then the control update du can be accepted. 

Under these conditions the method generates iterates of control u'(f ) that minimize the Lagrangian Jzf. 



4.2 Bang-Bang Control and Control Constraints 

According to Pontryaguin Maximum Principle|6|, when the control belongs to a compact set U, for the problem class 
considered where the control appears linearly in the Hamiltonian, the optimal control has a bang-bang structure. Obvi- 
ously, this result does not come from the variational equations (that seek V U H — 0), but directly from the maximisation 
of the Hamiltonian over the compact set. In the problem formulation, and Eq. (O, closed set on the control is assumed. 
However, the development of the method, referred to calculus of variations, assume the Lagrangian is differentiated 
on an open set. 

Constraints on the control can be set such that the control is implicitly limited. One approach is to formulate the 
control constraint (Eq. [3j using a barrier function p that is incorporated into the cost [ 1 3 1 [ 1 5 1 . Thus, when the control 
violates bounds at any time, the solution gets penalised and eventually the solver corrects the violation. This approach 
seems to provide practical results, and also as the control bounds may not be respected in the initial iterations of the 
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solver, the solver can go through a forbidden space of the search space and possibly find solutions with that extra 
controllability. This approach, however, does not ensure the optimal control solution is an admissible control (e.g. 
u i U). 

The approach considered here is to explicitly force the control to not exceed the desired limits. The principle is to 
identify a continuous function m : R — >• U and to introduce a slack variable v, such that the box constraint Eq.[3]can be 
replaced with the explicit equality constraint 

Ui-m(vi) = (47) 

Indeed, one can thus replace m, accordingly in the developments, and solve the problem for v, directly instead of 
iij. Since the compact set U is then not used, the gradient V V H can be used for the computation of the control updates, 
and this also provides an indication of the optimality of the control u (Weierstrass condition). Such an application m 
can be for instance, 

m(vj) = u\ + (w" — u l j) sin 2 Vj 

This approach, although numerically more challenging than penalization, makes sure that the control satisfies the 
problem description and never violates the box constraints Eq. ([3]). 

4.3 Heuristic for Convergence of Bang-Bang Problems 

Some transformations m : K — » U may allow, however, triviality or singularity to the problem. For instance, when 
Lagrange multipliers V are not used, for the i-th component of the control u, some transformations m yield, 

3t e [to,t f ],m(vi(t)) = => (H Vi (t) = 0,8v,(t) = 0) (48) 

No updates are made even though the control solution is not optimal. 
A diagonal matrix D > is thus introduced in the Hamiltonian, 

H(x,X,u,p;t) = A r f(x,u,p;f)+u r Du (49) 

This is equivalent to adding a quadratic Lagrange term in the cost / ~J(x(tf),u,pf). As a result, control u is smoothed. 
To obtain a bang-bang saturated control, D is updated and decreased periodically with the iterations for the control to 
be in a neighbourhood of an optimal bang-bang control solution. 

5 Continuous Solution Approximation 

5.1 Problematic 

To compute the feedback control coefficients, (X, j3, y and fi), a backwards integration (e.g. Eqs. 1251 . d26i>.l27b is done 
around a nominal trajectory {x,/?} that results from a forward integration. It is thus not possible to solve all the ODEs 
concurrently. It is necessary to store continuous solution of the ODEs, that is both the trajectory (e.g. x and X), and 
the sensitivity matrices (e.g. R, K) to then compute the control update. To respect the continuous developments of 
previous section, an efficient and reliable continuous data storage mechanism has to be devised. 

5.2 Current Discrete Approximation Approaches 

Often, the alternative is to work with the best discrete time approximate solutions and the definition of a time mesh 
where the solution can be evaluated. 

Many approaches have been proposed in the literature to produce a discrete control time mesh. The approach of 
keeping the control constant for fixed durations [15] [13] produces good results when the number of nodes has been 
well selected. Likewise, the control can be kept constant over multiple Runge-Kutta time steps for smooth control! 16]. 
But, since the control is used in the forward integration but not during the backwards integration, using the Runge- 
Kutta (RK) time steps alone does not seem appropriate for the bang-bang problem considered. Runge-Kutta integration 
is only time-reversal when used with a constant step size ll30l . Others! 3 1 ] [ 32 1 propose a refinement procedure based 
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on an integer programming problem, and consider the minimisation of the continuity error and the minimisation of 
number of the points to add. 

When the control is of a bang-bang structure, an other approach should be followed, because a constant time mesh 
for the control is likely to miss the switching times of the optimal control. The usual approaches increase the number 
of points close to the discontinuity. This has a major drawback of increasing the number of data points to store. 

The choice of a discrete, or multi-stage, formulation could be argued in regards to potential applications, since 
it is always simpler to implement a piecewise control. From a theoretical point of view, however, it is necessary to 
propose a scheme for limiting the approximation made on the control. Properties of the continuous control solution 
(e.g. switching structure, singular arcs) can eventually be used to produce a good discrete control solution. 



5.3 Continuous Trajectory Approximation 

The trajectory and the control are initially sampled at N given node points that can be described by a grid f, G [f0)*/]> 
i = 1..N. The trajectory and the control are thus exactly defined at the nodes, and an approximation to the continuous 
solution is constructed between the node points. Consider an explicit adaptive step-size Runge-Kutta integration 
method. It is the baseline for solving the ODE systems in the current paper. An explicit Runge-Kutta integration 
method uses the following iterative equations for integrating, 



y»+i = y„ + /i £ /?,-k; 



k; = f f t n + Cih, y„ + h £ flyk/J , 



(50) 
(51) 



where ay, bi and c, are constant scalars, and s is the order of the method. Shampine's method for dense output 
computation in Runge-Kutta computation is used[33|. One can introduce a polynomial S(x), and an error function 
e(x) such that, 



y(x)=S(x)+e(x) 
dx ax 



(52) 



e(* ) = 



Shampine fl33l thus defines polynomials P(x) of maximum degree m + 1 (m being the number of sub-integrations 
in one subinterval of length h), 

dP(x) 



dx 



^/f iOT (jf)f(jCi,ei + S(jfi)) 



(53) 



where Z, m are Lagrange basis polynomials of kth-order m. It is demonstrated there exists an integer n, and constants 
c\ and C2 m such that 



d 2 y(x) dS(x) 



<c,h n 



d 2 x d 2 x 

\\y(x)-nm<c2,, n h 2+mm ^ n+ ^ 



(54) 
(55) 



That is, P is a better higher order approximation of y that S is. An iterative scheme is used to construct P(x). The 
procedure starts with a simple Euler integration model for S (x) with m = 1 , and then increases the order m by evaluating 
successively S(x) and P(x). The construction of S and P is dependant of the extrapolation method (e.g. Runge-Kutta 
method), and P is only then available at the end of the integration over the interval of length h. 
For a fifth order explicit RK method, Vx £ [x B ,x„+i], 



y(x) = n +x(r 2 + (1 -jc)(r 3 +i(r 4 + (1 -i)r 5 ))), 



(56) 
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where 



ri =y„ 



(57) 



S 



r 2 = hY J bikj 



(58) 



r 3 =hk\ - (y„+i -y„) 
r 4 = -/z(k 7 +ki) 

rs = ft(djki + d^kj, + dukn + d^ks + d(,k(, + d-jk-j) 



(59) 
(60) 
(61) 



and di are constants of the method. Globally, the solution for the state trajectory and the sensitivity matrices can be 
stored as a "if approximation. For a fifth order RK method, five coefficients for each element of the ODE system, plus 
the range of validity of the polynomials are stored. 

Consequently, during the forward integration, the trajectory state x(f ) is stored continuously, using Eq. (l56t . and 
this continuous approximation is used during the backward integration. Likewise, during the backward integration, the 
costate X (f ) and the sensitivity matrices R(f ) , K(f ) , T(f ) , V(f ) continuous approximation are stored. Thus, all matrices 
and vectors involved in the computations follow a high resolution continuous approximation. Eventually, the backward 
stage being fully continuous, a continuous control update 8u is available. 

Additionally, to obtain the best trajectory and the most accurate optimal control, from the continuous control 
update 8u, an approach should be devised to construct a continuous approximation of the nominal control u. 

5.4 Continuous Nominal Control Approximation 

The continuous control approximation is constructed a-posteriori, from the results of the backward integration. In 
regards to Eq. ( [Tol l, the continuous control update should be added to the nominal control u to provide an updated 
control u that can be used as the new nominal control for the following iteration during the backward integration. 
Consider a nominal control u, during the forward recursion, when integrating ODE Eq. (fTJ. According to section 15731 
control update 8u can be constructed accurately using RK interpolant of matrices R, K for all t e [to, tf], and likewise, 
an accurate updated control u can be computed. This online produced nominal control has however to be stored, such 
that it can be used again for the backward recursion and future iterations. Practically, this is a difficult problem since 
neither u nor 8u have a readily available closed form, in general. 

Consider % the RK nodes of the forward integration with the control u + £ u 8u with values u(fc). The problem is 
to completely determined u such that 



where J and <j) are, respectively, the target merit function value and the constraints, computed with the improved control 
u + 8u. Tji and t]2 are given small tolerances. For very sensitive problems, the discrepancy between this new control 
u and the control ti + £ u 8u can result in large difference in the merit function value. This difference can possibly be 
bigger than the expected improvement (Eq. (144-b ) thus resulting in a failure of the method to converge properly. 

Each component of u is thus approximated between the coarse grid nodes to follow a piecewise-polynomial func- 
tion, or spline (linear function, cubic spline interpolant). Intermediate nodes at can be tested, and added if it 
results in a better approximation of u + £ u 8u. Eventually, u — > u. 

6 Second Order Algorithm and Implementation 

6.1 The Algorithm 

The algorithm is implemented (C-SOGO) and coded in C++. It uses efficient linear algebra classes with shared- 
memory parallelisation. A high level description of the algorithm is the following: 



|7(il + ei,5u)-/(u)|<Tii 

||0(u+£„5u)-<Hu)|| < m 



(62) 
(63) 



Given: 
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• convergence tolerance on the constraints rj v , and the optimality condition rjn- 

• Trust-Region radius for the control A M > 0. 

• Trust-Region radius for Lagrange multipliers A v > 0, and parameter A p > 0. 

• a nominal control u(f ) for t G [fo,f/]- 

• a nominal Lagrange multiplier V G R"v, 

• a regularisation matrix C G R"V'"v. 

• initial state xo = x(fo). 

Step Initialisation 

Step O.a. Initialise iteration counter. 
Step O.b. Set e* = and e* = 0. 

Step 1 Nominal trajectory 

Step La. Using a nominal control u(f ), compute the state trajectory x(f ). 
Step l.b. Compute nominal merit function value L. 

Step 2. Check terminal criteria 

Compute the terminal constraints \jf Eq. ©. If constraints \\y\\ > rjy go to Step 3. 
If optimality condition max (AL, > 1]h go to Step 3. 

Otherwise, an optimal solution has been found. Stop. 

Step 3 Backwards Step 

Step 3.a. Computation of the terminal conditions with Eqs. [36] [37] [35]and Eqs. [38] |40j [36] 

Step 3.b. Computation of the Shift matrix S uu thanks to the Trust-Region algorithm (see definition [TJ for 
H„u(?) + S ul] (?) > for all t G [fo,f/]. The Shift matrix S uu shall ensure the boundedness of the ODE 
solutions. 

Step 3.c. Integration of Eqs. d25l [26l and|31t and computation of continuous time approximation of A(t), R(t), 
K{t) and Q(t) for all t. During the integration, H„„ is shifted with S lm . As the equations are integrated, in 
accordance to sec. 15.31 a mesh and polynomials are constructed to provide a continuous approximation of 
them. 

Step 3.d. Likewise, if the problem includes a parameter, integrate Eqs. (|27| |. (l30l l. and d32l > and compute con- 
tinuous time approximations of T(t), V(t) and W(t) for all t. 

Step 4 Computation of Lagrange multipliers V 
Step 4.a. If Q(f ) < go to 4.c 

Step 4.b. Apply Trust-Region algorithm to have a negative definite Q(?o). 

Step 4.c. Computation of Lagrange multipliers V for the constraints using Eq. (f3~4-b . 

Step 5. Forward Integration Loop 
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Step 5.a. Integrate the dynamical state equations Eq. (Q]l with the control u + £„5u, where the control update 
5u is computed using Eq. ( fTOb . The continuous approximation of sensitivity matrices R(r ), K(f ), and Q(r) 
accurately provide the feedback coefficients a, j5 and y. To produce the improving control update 5u 
algorithm Al is used to find £„ and e v that minimise the merit function Eq. ©. 

Step 5.b. Evaluation of the constraints y(x;£/), the objective function J(x;tf), and the merit function value 
L(x,V,t f ). 

Step 5.c. If improvement in merit function, L < L, take e* = £„ and £* = e v . Go to step 6. 

Step 5.d. Improvement could not be found. Conditions of validity of the second order development might not 
be meet. Reduce Trust-Region radius A u . Increase Iteration counter. Go to step 3. 

Step 6. Accept current control. 

Step 6.a. Constructing a continuous time approximation of the new nominal control u <— u + £*,Su. 
Step 6.b. Update Lagrange multipliers, V = V + e^dv. 
Step 6.c. Update nominal merit function value L. 
Step 6.d. Update regularisation matrix C. 



Step 7. Continue with the next iteration. Increase iteration counter. Go to step 2. 

Algorithm Al is either a line-search method to get £„ G [0,1] that minimises S£ when V is not used, or otherwise, a 
simple two-dimensional [£„,£ v ]-grid search method to get the additional parameter £ v G [0,1]. On the current iteration, 
linesearch, or gridsearch, is stopped when a sufficient decrease is obtained. 

Convergence is achieved when both the constraints ^ and the optimality norm are satisfied to a given tolerance. A 
norm of optimality can be given by Eq. (l44t and H„. 

7 Examples 

The initial motivation of this work has been the optimization of low-thrust space trajectories, which present a bang- 
bang optimal control structure. Both presented examples have affine control dynamics with the control defined on a 
closed set, and thus the optimal control should be bang-bang. 

7.1 Double Integrator Problem 

The dynamics 



f(x,u;f) 



dt 



p 




V 


V 




u 



where p is the 1-D position of the particle, and v is its velocity. The control bounds are 

- 1 < u < 1 

Owing to the control constraints, the transformation u = 1 — 2 sin 2 v is used, and v is the new control to seek. 
Terminal constraints are 

>(*/)= 0" 



v(f / )=0 



(64) 



(65) 



(66) 



And the initial conditions are p(0) = 1, v(0) = 1. 
The objective is to minimize the total time,. 



J(x;tf) = tf — > max 



(67) 
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The time of flight tf is thus considered as a parameter of the problem. The integration of the equations is thus done 
through a change of variable T = t /tf such that the integration is done on [0, 1] with varying jacobian dx. That is we 
should use 

?(x,u,t / ;T)=f(x,u;« / )f / (68) 
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Figure 1: Iterates of a simple control problem. The coarse grid includes 3 node points. Left, continuous control 
iterates. Right, gradient of the Hamiltonian with respect to the continuous control. 

Figure Q] shows the iterates of program to find the solution. The control is approximated with a constrained cubic 
spline to prevent overshoot at intermediate points (with constrained cubic splines, the slope is imposed rather than the 
curvature). At the plateau of the control, in figureQ] the control is thus approximated with straight lines. 

Using as initial guess tf = 3.5 we found an optimal time fi = 3.449581. 

One can note how the number of points varies. The comparison to the smoothness of the control or the optimality 
of the solution is not straightforward, however. 

It is arguable that the control obtained with the method is not strictly bang-bang in the sense that it can reach 
intermediate values on a set of measure not null. However, from a numerical point of view, for an optimality tolerance 
small enough the error to the true bang-bang solution should be negligible. And, possibly, the solution obtained can 
be refined in a later stage with the switching time as parameter since an optimal control structure is then known. The 
result of the refinement is likely to be negligible if the optima 
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7.2 Simple Orbital Transfer Problem 



The problem is the one of transferring a spacecraft from one circular orbit to an other with higher radius. The terminal 
orbit is sought in the same plane as the initial orbit, and thus the dynamics is simplified to be two-dimensional. The 
spacecraft is propelled with continuous and constant thrust. With the gravitational constant ji, the dynamics are then, 



f(x,u; f ) = ^ 



r 








ve 




m 





r r L m 
_VrV6_ + F § 

_!_ 



(69) 



State elements r, v r , vq and m denote respectively the radial position, the radial velocity, the ortho-radial velocity 
and the mass of the spacecraft. F is the thrust force (0 < F < Fq = 5N) and V e is the exhaust velocity (V e = goI sp = 
19612m /s). The controls are thus the steering angle u and the thrust amplitude F. 

The initial orbit is circular of radius ro = 20000 km. The constraint is to be on a circular orbit of radius rf = 42000 
km at final time. Using the cylindrical coordinate, this yields the constraints, 



#(x;f/) = 



ve - 



(70) 



l(r-r f )\ 

The time of flight is fixed to tf — 4 days. The initial mass is 1000 kg. The objective is to maximize the final mass. 

J(x;tf) — m(tf) —> max (71) 

The final optimal mass is m(tf)* — 932.15 kg and the spacecraft make about 8 revolutions around Earth. The 
optimal control is depicted on figure [3] and the optimal trajectory is on figure |2] As can be seen on figure [3] the 
optimal control is of bang-bang type. The solver is able to find accurately the commutation points, while no insight of 
the control structure is initially provided. Such problem has usually many local optimal that can be sorted with respect 
to the number of revolutions. That is some solutions may have more, or less, than 8 revolutions for a constant time of 
flight. In general however, the best control is to thrust at the perigee (where the cost and the terminal constraints are 
more sensible to thrust) and coast at the apogee, as depicted in figure |2] 
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Orbit problem 




Figure 2: Optimal trajectory of the orbital transfer problem. 
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Figure 3: Control of the orbital transfer problem. 
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8 Conclusions 



The solving of optimal control problems with a continuous backward-forward sweep algorithm based on a second- 
order variations is presented. A second order expansion of a Lagrangian provides linear updates of the control to 
construct a locally feedback optimal control of the problem. The control updates are computed following backward 
and forward stages. Ordinary differential equations are integrated backward around a state trajectory that has been 
computed forwardly in an earlier stage. The method uses an accurate continuous approximation of the ODE solutions 
to ensure precision of the control updates. 

The method was demonstrated on two examples with bang-bang optimal control solution. It was shown that the 
solver can find accurately the switching structure of the solution, without any insight. 
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